Human-centric Indoor Scene Synthesis Using Stochastic Grammar
نویسندگان
چکیده
We present a human-centric method to sample and synthesize 3D room layouts and 2D images thereof, to obtain large-scale 2D/3D image data with the perfect per-pixel ground truth. An attributed spatial And-Or graph (S-AOG) is proposed to represent indoor scenes. The S-AOG is a probabilistic grammar model, in which the terminal nodes are object entities including room, furniture, and supported objects. Human contexts as contextual relations are encoded by Markov Random Fields (MRF) on the terminal nodes. We learn the distributions from an indoor scene dataset and sample new layouts using Monte Carlo Markov Chain. Experiments demonstrate that the proposed method can robustly sample a large variety of realistic room layouts based on three criteria: (i) visual realism comparing to a state-of-the-art room arrangement method, (ii) accuracy of the affordance maps with respect to ground-truth, and (ii) the functionality and naturalness of synthesized rooms evaluated by human subjects.
منابع مشابه
Supplementary Material for Human-centric Indoor Scene Synthesis Using Stochastic Grammar
Depth estimation Single-image depth estimation is a fundamental problem in computer vision, which has found broad applications in scene understanding, 3D modeling, and robotics. The problem is challenging since no reliable depth cues are available. In this task, the algorithms output a depth image based on a single RGB input image. To demonstrate the efficacy of our synthetic data, we compare t...
متن کاملConfigurable, Photorealistic Image Rendering and Ground Truth Synthesis by Sampling Stochastic Grammars Representing Indoor Scenes
We propose the configurable rendering of massive quantities of photorealistic images with ground truth for the purposes of training, benchmarking, and diagnosing computer vision models. In contrast to the conventional (crowdsourced) manual labeling of ground truth for a relatively modest number of RGB-D images captured by Kinect-like sensors, we devise a non-trivial configurable pipeline of alg...
متن کاملIntegrating Function, Geometry, Appearance for Scene Parsing
In this paper, we present a Stochastic Scene Grammar (SSG) for parsing 2D indoor images into 3D scene layouts. Our grammar model integrates object functionality, 3D object geometry, and their 2D image appearance in a Function-Geometry-Appearance (FGA) hierarchy. In contrast to the prevailing approach in the literature which recognizes scenes and detects objects through appearance-based classifi...
متن کاملHuman Centered Scene Understanding Based on 3D Long-Term Tracking Data
Scene understanding approaches are mainly based on geometric information, not considering the behavior of humans. The proposed approach introduces a novel human-centric scene understanding approach, based on long-term tracking information. Long-term tracking information is filtered, clustered and areas offering meaningful functionalities for humans are modeled using a kernel density estimation....
متن کاملA Stochastic Image Grammar for Fine-Grained 3D Scene Reconstruction
This paper presents a stochastic grammar for finegrained 3D scene reconstruction from a single image. At the heart of our approach is a small number of grammar rules that can describe the most common geometric structures, e.g., two straights lines being co-linear or orthogonal, or that a line lying on a planar region etc. With these grammar rules, we re-frame single-view 3D reconstruction probl...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2018